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Abstract 

Analysis of statistical data privacy has emerged as an important area of research. In this work we design al- 
gorithms to test privacy guarantees of a given Algorithm A executing on a data set V which contains potentially 
sensitive information about individuals. We design an efficient algorithm Atest which can verify whether A 
satisfies generalized differential privacy guarantee. Generalized differential privacy IBBG+llJ is a relaxation 
of the notion of differential privacy initially proposed by IIDMNS06L By now differential privacy is the most 
widely accepted notion of statistical data privacy. 

To design Algorithm Atest, we show a new connection between the differential privacy guarantee and 
Lipschitzness property of a given function. More specifically, we show that an efficient algorithm for testing of 
Lipschitz property can be transformed into Atest which can test for generalized differential privacy. Lipschitz 
property testing and its variants, first studied by |JR11 1, has been explored by many works MJRl II lAJMR12bl 
|AJMR12"al ICS 121 because of its intrinsic connection to data privacy as highlighted by IIJRllI . To develop a 
Lipschitz property tester with an explicit application in privacy has been an intriguing problem since the work 
of BJRl 11 . In our work, we present such a direct application of lipschitz tester to testing privacy . We provide 
concrete instantiations of Lipschitz testers (over both the hypercube and the hypergrid domains) which are used 
in Atest to test for privacy of Algorithm A when the underlying data set V is drawn from the hypercube and 
the hypergrid domains respectively. 

Apart from showing a direct connection between testing of privacy and Lipschitzness testing, we generalize 
the work of fJRl 11 to the setting of distribution property testing. We design an efficient Lipschitz testing 
algorithm when the distribution over the domain points is not uniform. More precisely, we design an efficient 
Lipschitz tester for the case where the domain points are drawn from hypercube according to some fixed product 
distribution. This result is of independent interest to the property testing community. It is important to note that 
to the best of our knowledge our results on Lipschitz testing over product distributions is the only positive result 
in property testing Uterature for non-uniform distributions after IIAC06I . 







1 Introduction 



Consider a data sharing platform like BlueKai, TellApart or Criteo. These platforms extensively collect and share 
user data with third-parties (e.g., advertisers) to enhance specific user experience (e.g., better behavioral targeting). 
Now, the third party applications use these data to train their machine learning algorithms for better prediction 
abilities. Since, the data which gets shared is extremely rich in user information, it immediately poses privacy 
concerns over the user information ||KorlO[|CKN+ll |. One way to address the privacy concerns due to the third- 
party learning algorithms is to train the third party algorithms "in-house", i.e., within the data sharing platform 
itself thus, making sure that the trained machine learning model preserves privacy of the underlying training data. 
In this paper, we study a theoretical abstraction of the above mentioned problem. 

Let D be a data set where each record corresponds to a particular user and contains potentially sensitive 
information about the user (for example, the click history of the user for a set of advertisements displayed). Let 
A be an algorithm that we would like to execute on the data set V (possibly to obtain some global trends about 
the users in V) without compromising individual's privacy. This challenging problem has recently received a 
lot of attention in the form of theoretical investigation in determining the privacy-utility trade-offs for various 
old and new algorithms. However, even if an algorithm is provably "safe", in practice the algorithm will be 
implemented in a programming language that may originate from untrusted third party. This brings its own set 
of challenges and has primarily been addressed in the following way: transform the algorithm A into a variant 
which provably satisfies some theoretically sound notion of data privacy (e.g., differential privacy IIDMNS06II ) 
either by syntactic manipulation (e.g. iMcS09[ [RPlOj ) or doing so in some algorithmic/systems framework (eg. 
IINRS07[ USm IMTS+ 121 IRSK+ lOt ). While each approach has i ts own appeal, they a ll have a few shortcomings. 
For example, they suffer from weak utility guarantees [NRS07 , MT S"'"12l iRSK+lOl or take prohibitively large 
running time IIJRllI or require use of specialized syntax [iMcS09l IRPIOI making it somewhat nontrivial for a 
non-privacy expert to produce an effective transformation. 

In this work, we take a new approach to the above problem which we call privacy testing. Specifically, we 
initiate the study of testing whether an input algorithm A satisfies statistical privacy guarantees. We do this by 
formulating the problem in the well-studied framework of property testing IIRS96a[ lGGR98all . 



Privacy testing Before we execute an Algorithm A which claims to satisfy a pre-approved notion of privacy, we 
test for the validity of such a claim. To the best of our knowledge, ours is the first work to study this approach. 
More precisely, in this work we initiate the study of testing an algorithm A for differential privacy guarantees. 
Differential privacy in the recent past has become a well established notion of privacy [ Dwo06[ iDwoOSl Ibwo09l . 
Roughly speaking, differential privacy guarantees that the output of an algorithm A will not depend "too much" 
on any particular record of the underlying data set V. We design testing algorithms to test whether A satisfies 
generalized differential privacy iBBG"'"Ill or not. Generalized differential privacy is a relaxation of differential 
privacy and follows the same principles as differential privacy. Under specific setting of parameters, generalized 
differential privacy collapses to the definition of differential privacy. For a precise definition, see Section |2.1[ It 
seems to us (and we make it more formal later on) that it may not be possible to design a computationally efficient 
testing algorithm for testing the notion of exact differential privacy, since in some sense it is a worst case notion 
privacy (see iBBG"'"IlllBD12ll for a discussion on this). 



Testing Lipschitz property under product distribution and its connection to privacy testing The goal of 
testing properties of functions is to distinguish between functions which satisfy a given property from functions 
which are "far" from satisfying the property. The notion of "far" is usually the fraction of points in the domain of 
the function on which the function needs to be redefined to make it satisfy the property. 

To test for generalized differential privacy, we show a new connection between differential privacy and the 
problem of testing Lipschitz property which was first studied by HJRlll . A recent line of work IIJRllllAJMR12bl 
IAJMR 12¥i has sought to explore applications of sublinear algorithms (specifically, property testers and recon- 
structors) to data privacy. We continue this line of work and show the first application of property testers (which 
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are vastly more efficient than property reconstructors) to the setting of data privacy. Indeed, prior to this work it 
was not clear if property testers for Lipschitz property can be used at all in data privacy setting. 

Let T be the universe from which data sets are drawn where each data set has the same number of records. A 
function / : T — )• Mis a-Lipschitz if for all pair of points G T the following condition holds: \f{x)—f{x')\ < 
dn {x,x') where dn is the Hamming distance between x and x' (that is, dn {x,x') is the number of entries in which 
X and x' differ). To define Lipschitz tester, we define the notion of distance between functions / and g defined 

def 

on the same (finite) domain T under distribution Distr as follows: dist{f,g) = Pr [f{x) ^ g{x)]. A 

Lipschitz tester gets an oracle access to function /, a distance parameter e G (0, 1]. It accepts Lipschitz functions 
/ and rejects with high probability functions / which are e-far from Lipschitz property. Namely, functions / for 
which mill dist{f, g) > e, where the minimum is taken over all Lipschitz functions g. In this work, we extend the 
result of IIJRllI to the setting of product distribution. 

While Distr is usually taken to be the uniform distribution in the property testing literature, in our setting 
it will be important to allow Distr to be more general distribution. Taking Distr to be something other than 
uniform distribution is challenging to investigate even for the special case of product distributions. Indeed, prior to 
this work the only positive result known for the product distribution setting is the work by IIAC06I for monotonicity 
testing. For the setting where Distr is an arbitrary unknown distribution there are exponential lower bounds on 
computational efficiency of the tester are known IIHK07II . Above result is stated for functions with discrete range 
of the form 

In this paper, we show that one can use a Lipschitz property testing algorithm (Liptest) as a proxy for testing 
generalized differential privacy. The tester Liptest should be able to sample efficiently the data set according 



to a given probability distribution defined over domain of these data sets (see Definition 2.2 1. It has been shown 
that this additional requirement is sufficient to give strong privacy guarantees for the algorithm being tested.( For 
further details see Section [3]) Additionally, for practical applications, this tester should run efficiently, especially 
over the large data set domain. 

With the above motivation in mind, we have designed such a Lipschitz tester with sub-linear time complexity 
(with respect to the domain size) for the hypercube domain T = {0, 1}"' with product distribution defined on data 
sets in T. (For further details, we refer the reader to Section]?]) With this construction, we can test the privacy 
guarantees of an algorithm in time that is poly-logarithmic in domain size. 

1.1 Related Work 

In the last few years, various notions of data privacy have been proposed. Some of the most prominent are k- 
anonymity [ Swe02|| . ^-diversity I1MGKVO6I . differential privacy || DM N S06l . noiseless privacy IBBG+Ill . nat- 



ural differential privacy IIBD12II and generalized differential privacy IBBG"*"!!!! . While ad-hoc notions like k- 
anonymity and £-di versify being broken IIGKS08L privacy community has pretty much converged to theoretical 
sound notions of privacy like differential privacy. In this paper, we work with the definition of generalized dif- 
ferential privacy (GDP), which is a generalization of differential privacy, noiseless privacy and natural differential 
privacy. The primary difference between GDP and the other related definitions is that it incorporates both the 
randomness in the underlying data set V and the randomness of the Algorithm A, where as other notions consider 
either the randomness of the data or the randomness of the algorithm. 

In this paper, we design algorithms (Atest) to test whether a given algorithm A satisfies GDP. In all our 
algorithms, we assume that A is given as a "white-box", i.e., complete access to the source code of A is provided. 
In this paper, all the instantiations of Atest are probabilistic and use Lipschitz property testing algorithms as 
underlying tool set. On a related note, in the field of formal verifications there have been recent works HRPIOI 
using which one can guarantee that a given algorithm A satisfy differential privacy. The caveat of these kind of 
static analysis based algorithms is that it needs the source code for A to be written in a type-safe language which 
is hard for a non-expert to adapt to. 

One of the primary reason for considering the sublinear (with respect to the domain size) time Lipschitz testers 
is the large size of domain often encountered in the study of statistical privacy of databases. The property testers 
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( l|RS96b[ IGGR98b1l ) have been extensively studied for various approximation and decision problems. They are of 
particular interest because they usually have sublinear (in input size) running time which is of particular interests 
in the problem with large inputs. Some of the ideas and definitions in this paper have been taken from the work 
on distribution testing ( I.HK071 IGS091 IAC06J ). Lipschitz property testers were introduced in [JRIU (which gave 
the explicit tester for the hypercube domain) and have since then been studied in IIAJMR12b[ IAJMR12all for the 
hypergrid domain. Recently IICS12II have proposed an optimal Lipshcitz tester for the hypercube domain with the 
underlying distribution being uniform. 



1.2 Our Contributions 

• Formulate testing of data privacy property as Lipschitz property testing: In this paper we initiate the 
study of testing privacy properties of a given candidate algorithm A. The specific privacy property that we 



test is generalized differential privacy (GDP) (see Definition 2.2 1. In order to design a tester for GDP prop- 
erty, we cast the problem of testing GDP property as a problem of testing Lipschitzness. (See Theorem 3.1 ) 
The problem of testing Lipschitzness was initially proposed by [ JRllJ . 

Design a generic transformation to convert an Algoritlim A to its GDP variant: We design a generic 
transformation to convert a candidate algorithm A to its generalized differentially private variant. (See 



Theorem 3.5 ) 



New results for Lipschitz property testing: In order to allow our privacy tester to be effective for a large 
class of data generating distributions, we extend the existing results of Lipschitz property testing to work 
with product distributions. We give the first efficient tester for the Lipschitz property for the hypercube 



domain which works for arbitrary product distribution. (See Theorem 4. 1 ) Previous works (even for other 
function properties) have mostly focused on the case of uniform distribution. To the best of our knowledge 
this is the only non-trivial positive result in property testing over arbitrary product distribution apart from 
the result of IIAC06I on monotonicity testing. 

Concrete instantiation of privacy testers based on old and new Lipschitz testers We instantiate privacy 
tester using Lipschitz tester described in the previous item to get a concrete instantiation of privacy tester. 
This also leads to a concrete instantiation of Item 2 mentioned above. We also instantiate privacy testers 
based on known Lipschitz testers in the literature. This is summarized in Section[5] 



1.3 Organization of the paper 

In Section |2j we introduce the notions of privacy used in this paper, namely, differential privacy and generalized 
differential privacy. We also introduce the concepts of general property testing and the specific instantiation of 
Lipschitz property testing. In Section|3] we show the formal connection between testing of generalized differential 
privacy (GDP) and Lipschitz property testing. In Section [4j we state our new results of Lipschitz property testing 
over product distributions in the hypercube domain. In Section[5j we show that Lipschitz testers over the hypergrid 
domain can be used to test for GDP when the data sets are drawn uniformly from the hypergrid domain. Lastly, in 
Section|6]we conclude with discussions and open problems. 



2 Preliminaries 

2.1 Differential Privacy and Generalized Differential Privacy 

In the last few years, differential privacy P DMNS061 has become a well-accepted notion of statistical data privacy 
in the data privacy community. At a high-level the definition of differential privacy implies that the output of 
a differentially private algorithm will be "almost" the same from an adversary's perspective irrespective of an 
individual's presence or absence in the underlying data set. The reason that it is a meaningful notion is because the 
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presence or absence of an individual in the data set does not affect the output of the algorithm "too much". This 
high-level intuition can be formalized as below: 

Definition 2.1 ((a, 7) -Differential Privacy IIDMNS06[|DKMN06II ). A randomized algorithm A is {a, j) -differentially 
private if for any two data sets D and T>' drawn from a domain T with \T)/S.T)'\ = 1 fA being the symmetric dif- 
ference), and for all measurable sets O C Range{A) the following holds: 

Ft[A{V) G O] < Pr[A(p') E O] + 7 



In the above definition if 7 = 0, we simply call it a-differential privacy. In this paper we intend to test if an 
algorithm A is a-differentially private. In order to test the above, we mould the problem into a problem of testing 
Lipschitzness over the probability measure induced by Algorithm A over a finite set S (see Section |3] for more 
discussion on this). Since, we want to test Lipschitzness efficiently with respect to the size of the set S, we will use 
a relaxed notion of differential privacy called generalized differential privacy (GDP) llBBG+lll . The main idea 



behind GDP is that it allows us to incorporate the randomness over the data generating distribution. This in turn 
allows us to incorporate the failure probability of the Lipschitzness testing algorithm (over the randomness of the 
data generating distribution). The definition of GDP below is a slight modification to the definition proposed in 
I BBG"*" 1 ft and in most natural settings is stronger than jBBG"*"!! !. 



Definition 2.2 ((a, 7, /3) -Generalized Differential Privacy). Let Dist be the distribution over the space of all 
data sets drawn from domain T. Let W C T be a set such that Pr-D^Distr G W] < /3. A randomized 
algorithm A is (a, 7, l3)-generalized differentially private (GDP) if for any pair data sets V,V' G T\W with 
|DAD'| = 1 (A being the symmetric difference) and for all measurable sets O C Range{A) the following holds: 
Pi[A{'D) € O] < e° Pr[^(D') G O] + 7, where the probability is over the randomness of the Algorithm A. 

It is worth mentioning here that the above definition generalizes the noiseless privacy definition llBBG+III 
and natural differential privacy definition IIBD12I1 in the literature. While in both noiseless and natural differential 
privacy definitions the randomness is solely over the data generating distribution Dist, in GDP the randomness is 
both over the data generating distribution and the randomness of the algorithm. 

At a high-level what GDP says is that there exists a set W of "bad" data sets where (a, 7)-differential privacy 
condition does not hold. But the probability of drawing a data set V (over the data generating distribution Distr) 
from W is at most /3 (which is usually negligible in the problem parameters). In fact if we set (3 = 0, then we 



recover (a, 7) -differential privacy definition (see Definition 2.1 1 exactly. Similarly, it can be shown that under 



different choices of (a, 7, /?) GDP implies both noiseless privacy and natural differential privacy. 



2.2 Lipschitz Property Testing 

In this work we show that efficiently testing whether an algorithm is (a, (3, 7) -generalized differentially private 
reduces to the problem of testing (with high success probability over the probability measure induced by Algorithm 
A) if the output is Lipschitz. (For further details see section, see section[3]) 

Definition 2.3. Given a function f : T ^ Mfrom a metric space (T, d-j-) to (M, (Im), where and d^ denote the 
distance function on the domain D and the range R respectively. The function f is c-Lipschitz ifd^{f{x), f{y)) < 
c-dT{x,y). 

Property testing ( IIGGR98bl . IIRS96b 1 ) is a well studied area pertaining to randomized approximation algo- 
rithms for decision problems usually having sublinear time and query complexity. At one end of the spectrum, 
most of the work previously done in this area assume a uniform distribution over domain elements. The other end 
is to consider the setting where the distribution over the domain points is not known ( IIHK07I1 ). 

Here, we assume that the probability measure over domain elements is known and is not necessarily uniform. 
Although seemingly important, to the best of our knowledge, this is the first time that such a setting is explored in 
the lipschitz property testing. To state our results, we will need the following notation. 
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Let V {e.g. Lipschitzness in this case) be the property that needs to be tested over the range of function 
f : D ^ R. We define the distance of the function / from V as follows. 

Definition 2.4. Let V and T he defined as above. The V-distance between functions f,g€zJ- is defined 

by dist'p{f, g) Vvxr^T{f{^) 7^ sC^)}- The V-distance of a function f from property V is defined as 
distvi^f ^ V) = ming,zpdistp(f, g). We say that f is e-farfrom a property V ifdistp{f, V) > e. 

We will need the notion of the image diameter of a function / for explaining our results, which, roughly 
speaking, is the difference between maximum and minimum values taken by / on domain T. 

Definition 2.5 (Image diameter). The image diameter of a function / : 7" — )• M, denoted by ImD{f), is the 
difference between the maximum and the minimum values attained by f, i.e., maxa;g7- f{x) — miUx^T fi^)- 



3 Test for Generalized Differential Privacy 

In this work we initiate the study of testing whether a given algorithm A satisfies statistical data privacy guarantees. 
As a specific instantiation of the problem, we study the notion of generalized differential privacy (GDP) (see 
Definition |2.2| ). Roughly speaking, GDP guarantee ensures that the output of Algorithm A when executed on data 
set V does not depend "too much" on any one entry of V. The term "too much" is formalized by three parameters 
a, 7 and /3, where the first two parameters (a and 7) depends on the randomness of the Algorithm A and the 
parameter (3 depends on the randomness of the distribution Distr generating the data. We refer to the guarantee 
as (a, 7, /3) -Generalized Differential Privacy (or simply (a, 7, /3)-GDP). 

Given an algorithm A, we design a tester Atest with the following property: if the tester outputs YES, then 
Algorithm A is (a, 7, /3) -generalized differentially private where the parameters (3 and 7 can be made arbitrarily 
small (at the cost of increased running time). If the tester outputs NO, then the Algorithm A is not o-differentially 
private. We state this formally below. 



Theorem 3.1 {{0, a, 7, /3)-Privacy testing). Let Liptest be a 0-approximate Lipschitz tester (see Definition 3.2 
below), let Distr be a distribution on the domain of datasets T and let A be an algorithm which on input 
T) ~ Distr outputs a value A(T>) in the finite set T. Suppose there is an oracle which for every value o € F 
and for every T> G T allows constant time access to the probability measure fi{A{T>) = 0) (where the measure is 
over the randomness of the algorithm A). Then there exists a "testing" algorithm Atest which on input privacy 
parameters a, /3 £ (0, 1], failure probability parameter 7 G (0, 1] and access to Oj[ and Distr satisfies the 
following guarantee. 

• (soundness) If Algorithm Atest outputs NO, then the candidate algorithm A is not a-differentially private. 

• (completeness) If Algorithm Atest outputs YES, then with probability at least 1— 7 the candidate algorithm 
A is {a9, 0, P)- generalized differentially private. 

The algorithm Atest uses Liptest as a subroutine and runs in time 0{\T\ ■ (Run time of Liptest)). 



To prove Theorem 3.1 we show a new connection between testing (a, 0, /3)-GDP and the problem of testing 
Lipschitz property. The study of testing Lipschitz property was initiated by HJRlll . We present an algorithm Atest 
for testing (a, 0, /3)-GDP based on a generalization of Lipschitz tester presented in IIJRIII . We formally define 
the (generalized) Lipschitz tester below where the definition differs from the standard property testing definition 
(example, as used in IIJRIII ) in two aspects: (i) we require Lipschitz testers to only distinguish between Lipschitz 
functions from functions which are far from 0-Lipschitz functions for some fixed 9 > 1 and (ii) we measure 
distance between functions (in particular, how "far" the function is from satisfying the property) with respect to a 
pre-defined probability measure Distr on the domain. 
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Definition 3.2 (^-approximate Lipschitz tester). A 9 -approximate Lipschitz tester Liptest is a randomized al- 
gorithm that gets as input: ( i) oracle access to function / : T — )• M; f ii) oracle access to independent samples 
from distribution Distr on T and (Hi) parameters e, 7 G (0, 1]. It outputs a YES/NO value and provides the 
following guarantee. 

• If Liptest outputs NO, then with probability 1, the function f is not Lipschitz- 

• If Liptest outputs YES, then with probability at least 1 — 7, there exists a set W Q T such that (i) the 
input function f is 6-Lipschitz on the domain T \ W and (ii) P^^V'^Distr 

We remark that setting 9 = 1 and Distr to be the uniform distribution on T recovers the standard definition 
of property tester (in our case, Lipschitz tester as defined in [ JRll ] ). 



In Section 3.2 we show that one can extend the connection between GDP and Lipschitz testing to design an 
algorithm AprivGen which converts the candidate algorithm ^ in to a (a, 7, /3) -generalized differentially private 
algorithm. 

3.1 (Generalized) Differential Privacy as Lipschitz Property over a Probability Measure 

Consider the domain of the data sets T to be a finite set and assume that (the randomized) Algorithm A, whose 
privacy property is to be tested, maps a data set D G T to another finite set T, i.e. any output of A is always an 



element in T. Now let us look at the privacy guarantee of GDP (see Definition 2.2 1. Ignoring the parameters f3 and 
7, the privacy guarantee suggests that for any pair of neighboring data sets V,V ^ T (drawn from the distribution 
Distr) and any o € F, the following is true: 

e-^fiiAiV) = o)< fi{A{V) = o)< e"^(^(P') = o) (1) 

The measure fi is the probability induced by the randomness of the Algorithm A. Taking logarithm of ([T]), we 

get 

I log fi{A{V) = 0)- log ^l{A{V') = o)\<a (2) 

We will use the following formulation of Q: |^ log;u(^(P) = o) - ^ \ogii{A{V') = o)\ < dniT),!)'), where 
dn is the Hamming metric. Now, if we view the expression ^ log ^{A{'D) = o) as a function Aq : T — )• M defined 
by setting Xo{V) = ^ log ti(^A(V) = 0), then we get the following condition: {XoCD) - Xo{'D')\ < duiV^V). 
This condition is exactly the Lipschitzness guarantee for Aq under the Hamming metric. Using this observation we 
state the following meta-algorithm Atest (Algorithm [T]) to test whether given Algorithm A is (a, 0, /3) -generalized 
differentially private. In Algorithm [T] (Algorithm Atest), we use a black box Lipschitz property tester Liptest. 
Later in the paper we instantiate Liptest with a specific testing algorithms. 

Algorithm 1 Atest'- Generalized Differential Privacy (GDP) tester 

Require: Algorithm A, data generating distribution Distr, data domain T, output range F, privacy parameters 
(a, /3) and failure parameter 7 
flag ^ FALSE 



Let Liptest be a ^-approximate Lipschitz tester defined in Definition 3.2 
for all values o G F do 

Define function Ao : T ^ M by setting \o{V) = ^ log ^liA{V) = 0). 

Run Liptest on Ao with proximity parameter ^ and failure probability parameter |^ . 

If Liptest outputs NO, then flag ^ TRUE 
end for 

If flag = FALSE, then output YES, otherwise output NO 
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At a high-level Algorithm Atest does the following. For each possible output o G F, it defines a function 
table Ao (with the domain T). It then invokes the Lipschitz testing algorithm Liptest to test Ao for Lipschitzness 
property. If for every output o G F, Liptest outputs YES, then Atest outputs affirmative, and outputs negative 
otherwise. 

3. 1. 1 Proof of Theorem ED 

The claim about the running time of Algorithm Atest stated in Theorem |3. Ij foUows directly from the definition of 
Algorithm Atest (Algorithm [T]). We state and prove the soundness and completeness guarantees of Theorem 3.1 
separately as Claim 3.3 and Claim respectively below. 



Claim 3.3 (Soundness guarantee). If Algorithm Atest (Algorithm^ outputs NO, then the candidate algorithm 
A is not a-differentially private. 

Proof. If Algorithm Atest outputs a NO, then there exists an o G F such that Liptest outputs NO on Aq. By 



defintion of Liptest (see Definition 3.2 1, we get that Ao is not Lipschitz. In other words, we have, |Ao(2?) — 
Xo{V')\ = l^log fi{A{V) = o) - ^ log fi{A{V') = o)\ > 1. Therefore, either f^{A{V) = o) > e"'n{A(V') = a) 
or /iiA{V) = o) < e-'^n{A(V') = a), as required. □ 

Claim 3.4 (Completeness guarantee). If Algorithm Atest (Algorithm^ outputs YES, then with probability at 
least 1 — 7 (over the randomness of Liptest), the candidate algorithm A is {a9, 0, f3) -generalized differentially 
private. 

Proof. If Algorithm A outputs YES, then by the union bound it follows that with probability at least 1 — 7, 
the following condition holds for every o G F: There exists a set Wo ^ T such that (i) Ao satisfies ^-Lipschitz 
condition for every G T \ Wo and (ii) Pr [x G Wo] < tIt . 

x^Distr I I 

Let W = Wo. We show that with probability at least 1 —7 (over the randomness of Liptest), the following 

oer 

holds: algorithm A satisfies a^-differential privacy condition on the set T \ W and Pr [V G W] < f3. 

Vr^Distr 

Condition (i) above implies that for every o G F, Ao is 0-Lipschitz on T \ W. Therefore, we get the following 
for every neighboring pairs of data sets V,V' £ T \ W. 

\Xo{V) - Xo{V')\ < 9 

^ I - log fiiAiV) = o) - - log niAiV) = o)\<e 
a a 

, -ad < /i(.4(P) = o) g 

- ^{A{V') = o) - ' 

Also, using Condition (ii) and the union bound over all o G F, we get the following. 

Pr [P G W] < y Pr [V G Wo] < p. 

V^Distr ^ V^Distr 

Since Conditions (i) and (ii) both hold with probability at least 1 — 7 (over the randomness of Liptest), we 
get the desired claim. 

□ 

3.2 Application of GDP tester to ensure privacy for the output of a given candidate algorithm 

In this section we will demonstrate how one can use Algorithm Atest (Algorithm [T]) designed in the previous 
section to guarantee (a, /3, 7) -generalized differential privacy to the output produced by a candidate Algorithm A. 
The details are given in Algorithm |2] The theoretical guarantees for Algorithm |2] are given below. 
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Theorem 3.5 ({6, a, 7, /3) -generalized differentially private mechanism). Let Liptest be a 6 -approximate Lips- 
chitz tester (see Definition \3.2\ used in the testing algorithm Atest (Algorithm^. Under the same assumptions of 
Theorem 3.1 following are true for Algorithm AprivGen (Algorithm^. 

• (privacy) Algorithm AprivGen (Algorithm^ is (aO, /3, ^)-generalized differentially private ( GDP). 

• (utility) If the candidate Algorithm A is a- differentially private, then Algorithm AprivGen (Algorithm^ 
always produces the output A{T)). 



Algorithm 2 AprivGen'- Generalized differentially private mechanism 

Require: Data set V, candidate algorithm A, testing algorithm Atest, data generating distribution Distr, data 
domain T, output set F, privacy parameters (a, /?, 7) 
1: Run Atest with parameters A, Distr, T, T, privacy parameters (a, (3), and failure parameter 7 
2: If Atest outputs YES, then output AiV), output FAILURE otherwise 



3.2. 1 Proof of Theorem |32] 

The proof of Theorem |3.5| follows from the two claims below. 

Claim 3.6 (Privacy). Algorithm AprivGen (Algorithm^ is (a6, 7, (3) -generalized differentially private (GDP). 

Proof. First note that from Claim 3.4 it follows that if Algorithm Atest (Algorithm [T]l outputs YES, then w.p. 
> 1 — 7, the candidate algorithm A is {aO, 0, /3)-GDP. Now to complete the proof, we provide the following 
argument. 

• Case 1 [Algorithm |2] outputs A{D)\: We define event Ev to be the following: For every o € F there 
exists a set Wo C T such that (i) Aq satisfies 9-Lipschitz condition for every 2?, P' G T \ Wo and (ii) 

Pr [x € Wo] < (3. As implied by the GDP guarantee, event Ev holds with probability 1 — 7. Hence, 

we have the following for all o G F U {FAILURE] 

^MprivGen{V) = o] < Y'v[AprivGen{V) = o\Ev] VllEv] + Vv{Ev] 

< e"^ PrlAprivGenCD') = o\Ev] Fr[Ev] + 7 

< e"^ PT[AprivGen{V') = O A Ev] + 

< e"^ PviAprivGenCD') = o] + 7 

• Case 2[Algorithm|2] outputs FAILURE]: In this case, the output is trivially (a, 7, /3) -generalized dif- 
ferentially private since the output (i.e., FAILURE) is independent of the data set V. 

With this the proof is complete. □ 

Claim 3.7 (Utility). If the candidate Algorithm A is a-differentially private, then Algorithm AprivGen (Algorithm 
|2| always produces the output A{T>). 

The proof of the above claim follows from the fact that if the candidate algorithm A is a-differentially private, 
then Atest will always output YES. 



8 



4 Lipschitz Property Testing over Hypercube domain 



In this section, we present a (1 + (5)-approximate Lipschitz tester (see Definition 3.2) for functions defined on 
T = {0, l}"^ where the notion of distance is with respect to any product distribution. Specifically, the points in 
the data set are distributed according to the product distribution 11 = Ber{pi) x Ber{p2) x xBer{pd) where 
Ber{p) denotes the Bernoulli distribution with probability p. For any vertex x = {xi,X2, ■■■,Xd) G T, Xj = 1 
with probability pi and with probability I — pi. Each vertex in x € T has an associated probability mass 



Px = Ph ■Pi2--- Pik ■ (1 - Ph) ■ (1 - Ph 



[1 — Pja_f,) where k is the hamming weight of x, also denoted by 



H{x) and ii, Z2, ifc denote the indices of x with bit- value 1. 

In this section, we prove the following theorem which gives a 1-approximate Lipschitz tester for dZ-valued 
functions. A function is 5Z valued if it produces outputs in integral multiples of 6. 

Theorem 4.1. Let T = {0,1}'^ be the domain from which the data set are drawn according to a product probability 
distribution 11 = Ber{pi) x Ber{p2) x xBer{p4). The Lipschitz property of functions f : T ^ 5Z on these 

I d-mm{d,ImD{f)} 



data sets can be tested non-adaptively and with one sided error probability uj in 0{- 



6 G (0, 1]. Here ImD is the image diameter defined in Definition 2.5. 



5{e-d^S) 



ln(-)) time for 



Following is an easy corollary of the above giving a (l+(5)-approximate Lipschitz tester for M-valued functions. 



Corollary 4.2 (of Theorem 4.1 1. Let T = {0, 1} be the domain from which the data set are drawn according to 
a product probability distribution li = Ber{pi) x Ber{p2) x xBer{pd). There is an algorithm that on input 
parameters (5 G (0, 1], e G (0, 1), d and oracle access to a function f : {0, l}'^ — )• M has the following behavior: 
It accepts if f is Lipschitz. and rejects with probability at least 1 — lo if f is e-far ( with respect to the distribution 
li) from (1 + 5)-Lipschitz and runs in ln(^)) time. Here ImD is the image diameter defined in 



Definition 2.5 



The proof of above theorem and corollary appears in Section 4.1 To state the proof we need the following 
technical result. 

We define a distribution on edges of the hypercube where the probability mass of an edge {x, y} is given by 
P2dpL_ Note that J2{x,y)eE{Ha) ^^^4^ = 1- Thus the probability distribution (we call it De henceforth) on 
the edges defined above is consistent. Our tester is based on detecting violated edges (that is, edges which violate 



Lipschitz property) sampled from distribution De- Our main technical lemma (Lemma 4.3 1 gives a lower bound on 
the probability of sampling a violated edge according to distribution De for a. function that is e-far from Lipschitz. 
(Recall that e-far is measured with respect to the distribution H.) 

Lemma 4.3. Let function f : {0, 1}^ — )■ 5Z be e-far from Lipschitz.. Then 

(Px+Py) ^ 6{e-(f6) 



E 

{^,2/)ey(/) 



d 



> 



d ■ ImD{f ) 



Here ImD is the image diameter defined in Definition 2.5 



We prove the above lemma in section 4.2.1 



4.1 Lipschitz tester 

In this section we prove Theorem 4.1 and Corollary |4.2| We first present the algorithm stated in Theorem 4.1 



Proof of Theorem First observe that if input function / is Lipschitz then the Algorithm[3]always accepts. This 



is because a Lipschitz function / has image diameter (see Definition 2.5 1 at most d (and hence cannot be rejected 



in Step|4] Moreover, it does not have any violated edges (and hence cannot be rejected in Step [6]). Next consider 
the case when / is e-far from Lipschitz. Towards this we first extend Claim 3.1 of | JRl 1 ] about sample diameter r 
to our setting where the distance (in particular, the notion of e-far) is measured with respect to product distribution. 
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Algorithm 3 Lipschitz Tester 



Require: Data domain T = {0, l}*^, product distribution on data set 11 = Ber{pi) x Ber{p2) x xBer{pci), 
failure probability parameter uj, "P-distance parameter e', discretization parameter 6 
1: Set e = e' - d?5. 

2: Sample vertices zi, Z2, zt independently from T according to the distribution 11 

3: Let r = max-^^ /(zj) - min*^;^ f{zi) 
4: If r > d, reject 

5: Sample edges independently with each edge (x, y) picked with probability ^^"'"^^'^^ from the hy- 

percube T 

6: If any of the sampled edges are violated, then reject, else accept 



Claim 4.4. The steps 1. and 2. of the tester outputs r ^ bTL such that r < ImD{f) and with probability at least 
1 — ^ (failure probability at most ^ j, / is e-close to having diameter that is at most r. 

Proof. Sort the points in {0, l}"* according the function value in non-decreasing order. Let L be the first ^-points 
such that their probability mass sums up to | and R be the set of last £' points such that their probability mass sums 
up to |. The rest of the proof is very similar to the proof of Claim 3.1 in IIJRlll . so we omit the details here. □ 

Having established Claim [4~4| rest of the proof is identical to lURllH and we omit the details. □ 

Proof of Corollary [?!2] It is identical to the proof of Corollary 1.2 in IIJRlll and we omit the details. □ 



4.2 Repair Operator and Proof of Lemma 4.3 



We show a transformation of an arbitrary function / : {0, l}'^ — )■ (5Z into Lipschitz function by changing / on 
certain points, whose probability mass is related to the probability mass (with respect to De) of the violated edges 
of T. This is achieved by repairing one dimension of T at a time as explained henceforth. To achieve this, we define 
an asymmetric version of the basic operator of PJR11|. The operator redefines function values so that it reduces 
the gap asymmetrically according to the Hamming weights (and probability masses in-turn) of the endpoints of 
the violated edge. This is the main difference from previous approaches ( IIJRlll . IIAJMR12bl ) which do not work 
if applied directly, because of the varying probability masses of the vertices with respect to the Hamming weight. 
We first define the building block of the repair operator which is called the asymmetric basic operator. 

Definition 4.5 (Asymmetric basic operator). Given f : {0, l}'^ — SZ, for each violated edge {x, y} along dimen- 
sion i, where f[x) < f{y) — 1, define Bi as follows. 

1. IfH{x) > H{y), then B,[f]{x) = f{x) + {1 - pi)5 and Bi[f]{y) = f{y) - pi6 

2. IfH{x) < H{y), thenB,[f]{x) = f{x) + Pi6 and Bi[f]{y) = f{y) - {I - Pi)S 

Now we define the repair operator. 

Definition 4.6 (Repair operator). Given f : {0, 1}'' — )• (5Z, Aj[/](x) is obtained from f by several applications 
of the asymmetric basic operator (see Definition \4.5\ Bi along dimension i followed by a single application of the 
rounding operator Specifically, let f be the function obtained from f by applying Bi repeatedly until there are no 
violated edges along the i-th dimension. Then, Ai[f] is defined to be R[/'] where the rounding operator R rounds 
the function values to the closest SZ-valued function. 

In effect, we have the following picture for the repair operation. 

f = Jo > fl > 72 — ^ • • • — ^ fd-1 > Id- 
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Now we define a measure called violation score which will be used to show the progress of repair operation. 
As shown later, the violation score is approximately preserved along any dimension j 7^ i when we apply the 
repair operator to repair the edges along dimension i. Note that the violation score closely resembles the violation 
score in HJRllH except that it depends on the function value as well as the probability masses of the end-points of 
the edge. 

Definition 4.7. The violation score of an edge with respect to function f, denoted by vs{{x, y}), is max(0, {px + 
Py){\f{^) ~ f{y) \ ~ !))■ violation score along dimension i, denoted by VS^{f), is the sum of violation scores 
of all edges along dimension i 

The violation score of an edge {x, y} is positive iff it is violated and violation score of a 5Z, valued function is 
contained in the interval [6{px + Py), IfnD{f){px + Py)]- Let V^{f) denote be the set of edges along dimension i 
violated by /. Then 

{P-+Py)<yS\f)< (Px + Py) ■ ImDif) (3) 

{x,y}ev^f) {x,y}eVHf) 



Lemma 4.9 shows that Aj does not increase the violation score in dimensions other than i more than the additive 



value of 6. The lemma makes use of the following claim. 

Claim 4.8 (Rounding is safe). Given a, 6 € M satisfying \a — b\ < 1, let a' (respectively, b') be the value obtained 
by rounding a (respectively, b) to the closest bTL integer. Then |a' — 6'| < 1. 

Proof. Assume without loss of generality a < b. For x G M, let [x\ ^ be the largest value in not greater than x. 
Observe that a' € { L^J,? 7 l^^ls + Using the fact that [a\g < b' < [a\g + I + 6, we see that if a' = [a\g + 5 
then \b' — a'\ < 1 always holds. Therefore, assume a' = [a\g. This can happen only if a < [a\g + 6 /2. The latter 
implies 6< [a\g + l + 6/2 (using the fact that 6 - a < 1). That is 6' / L«J 5 + 1 + ^- In other words, b' < [aj ^ + 1 
again implying 6' — a' < 1, as required. □ 

Lemma 4.9. For all i, j G [d], where i j, and every function f : {0, l}*^ — t- (5Z, the following holds. 

• (progress) Applying the repair operator Ai does not introduce new violated edges in dimension j if the 
dimension j is violation free, i.e. VSj{f) = =^ VSj{A^[f]) = 0. 

• (accounting) Applying the repair operator Ai does not increase the violation score in dimension j by more 
than 6, i.e. VSj{A'[f]) < VSjif) + S. 

Proof. Let /' be the function obtained from / by applying Bi repeatedly until there are no violated edges along 
the z-th dimension. We prove the following stronger claim to prove the lemma. 

Claim 4.10. VSjif) < VSj{f). 

We prove the above claim momentarily but first prove the lemma using the above claim. The function Ai [/] is 
obtained by rounding the values of /' to the closest (5Z values. Since rounding can never create new edge violations 
by Claim |4.8[ we immediately get the first part of the lemma. The second part follows from the observation that 
the rounding step modifies each function value by at most 5/2. Correspondingly, the violation score of an edge 
along the j-th dimension changes by at most 2 • {5 /2) • (p„ + p^) where the factor 2 comes because both endpoints 
of an edge may be rounded. Summing over all edges in the j-th dimension, we get, increase in violation score < 
v} ^ ' iPi^ + Pf) ~ ^ where the last equality holds because edges along the j-th dimension form a perfect 
matching and therefore the probabilities p„ + sum to 1 . 



Proof of Claim 4.10 Following the proof outline of a similar proof in IIJRllll . we show that application of the 



asymmetric basic operator in dimension i does not increase the violation score in dimension j ^ i. Standard 
arguments iGGL+OOl IDGL+991 IJRTTI IAJMR12bll show that it is enough to analyze the effect of applying Bi 
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on one fixed disjoint square formed by adjacent edges tliat cross dimensions i and j. (This is because edges 



along dimensions i and j form disjoint squares in the hypercube. So having established Claim 4.10 for one fixed 
square of the hypercube, the full claim follows by summing up the inequalities over all such squares.) Consider 
the two dimensional function / : {xb,xt,yb,yt} where {xb, xt,yb,yt} are positioned such that H{yt) = 

H{xt) + 1 = H{yb) + 1 = H{xb) + 2 where H{xb) denotes the hamming weight of Xb- Assume that the basic 
operator is applied along the dimension i. We show that the violation score along dimension j does not increase. 
Assume that the violation score along edge {xb, xt} increases. First, assume that the Bi[f]{xt) > Bi[f]{xb). (The 
other case is very similar and we will prove it later.) Then Bi increases f{xt) and/or decreases f{xb). Assume 
that Bi increases f{xt). (The other case is symmetrical.) This implies that {xt, yt} is violated and f{xt) < f{yt)- 
Let fk{x) (resp. fk{y)) denote the value of f{x) (resp. f{y)) after k applications of Bi on an edge (x, y), for an 
integer > 0. If (x, y) is violated after k — I applications of the basic operator, then fk{x) = fk-i{x) + piS and 
fk{y) = fk-i{y) - - Pi)S else fk{x) = fk-i{x) and fk{y) = We will study the effect of applying 5^ 
on (xt, yt) multiple (say A > 1) times. Recall that the repair operator is applied only if the edge is violated. This 
means that 



fx-ii^t) < 

^ f{xt) + {X-l)piS < 

^ f{xt) + {X- 1)6 + 1 < 

^f{xt) + X6 + l < 



fx-i{yt) - 1 

f{y,)-{X-l){l-p,)6-l 

f{yt) 
f{yt) 



The second inequality follows from the observation that since the edge is being corrected in the A application, 
it must have been corrected in all previous applications as well. The last inequality follows from the fact that / is 
a (5Z- valued function and ^ is an integer. We subtract {1 — pi){X — 1)6 from both sides in the above inequality and 
do some rearrangement to achieve the following. 



f{yt)-{l-Pi){X-l)6 > f{xt) + X6 + l-{l-pi){X-l)6 
f{yt)-{l-pi){X-l)6 > f{xt) + {X-l)pi6 + l + 6 
^fx-i{yt) > fxM^t) + l + 6 



The above inequality is crucial for the remaining proof of the lemma 4.3 Now consider the cases when either 
the bottom edge is also violated or is not violated. 

If the bottom edge is not violated then we have fx-i{xb) > fx-i{yb) — 1 and fx-i{xb) and fx-iivb) are not 
modified by the basic operator. Since vs{{xt,Xb}) increases, fx-i{xt) > fx~i{xb) + 1 — Pi6. Combining the 
above inequalities, we get /A_i(yt) > fx-i{xt) + l + 6 > fx-i{xb) + 2 + {1 - pi)6 > fx-i{yb) + l + il-Pi)6 > 
fx-i{yb) + 1- Thus the violation score increases along {xt, Xb} by (px^ +Pxt)Pi6 and decreases along {yb, yt} by 

(Pvb + Pyt){'^ - Pi)^ = (Pxt +Pxt) (t^) i'^-Pi)^ which is same as {p^^ + Pxt)pi6, keeping the violation score 
along the dimension j unchanged. 

If the bottom edge is violated, then the increase in vs{{xb, xt}) implies that fx^i{xb) must decrease (after 
application ofBj) by (since iJ(xfc) < H{yb)) implying fx-i{yb) + l < fx-i{xb))- Therefore fx-i{xt)+Pi6 > 
fx-i{xb) + l-piSor fx-i{xt) > fx-i{yt) + l-2pi6. Therefore fx-i{yt) > fx-i{xt) + l > f{xb) + 2-2pi6 > 
fiVb) + 3 — 2pi6 + 6 > f{yb) + 1 + 6. The last inequality is true since 6 < 1 and pi < 1. Thus, vs{{xt, Xb}) 
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increases by at most {p^^ + Pxt)'^Pi6 while vs{{yt,yb}) decreases by {py^ + Pyj2{l - pi)6 = {p^^ + Pxt)2pi6, 
ensuring that violation score along the vertical dimension does not increase. 

Now we turn to the case when Bi[f]{xt) < Bi[f]{xb)- By the arguments very similar to the first case, it can 
be proved that f\~i{xt) > f\-i{yt) + 1 + 5 and the application of basic operator decreases f{xt) by pi5 and 
increases f {yt) hy {I - pi)5 . 

If the bottom edge is not violated then /a-i (yt) > f\-i{xb) — 1 and f\-i{xi,) and fx-i{yb) are not modified by 
the basic operator. Since vs{{xt, Xb}) increases, f\-i{xb) > fx-i{xt) + l—pi6. Combining the above inequalities, 
we get f\-i{yb) > fx-i{xb) — 1 > f{xt) —PiS > f{yt) + 1 + S{1 —pi). Thus the violation score increases along 
{xt,Xb] by (pa;^ + Pxt)pi5 and decreases along {yb, yt] by (pj,, + py^){l - pi)5 = (p^, + p^J ( iz^) (1 - Pi¥ 
which is same as {px^^ + PxJPi^^ keeping the violation score along the dimension j unchanged. 

If the bottom edge is violated, then the increase in vs{{xb, xt}) implies that f\-i{xb) must increase implying 
f\-i{yb) > fx-i{xb) + 1- Therefore, the increase in vs{xb, xt} implies that fx-iixb) +PiS > fx-i{xt) -Pi6 + 1 
or fx-i{xb) > fx-iixt) - 2pi6 + 1. Combining the above inequalities, we get fx-iiVb) > fx-i{xb) + 1 > 
fx-i{xt) — 2pi6+2 > fx-i{yt)+5+S—2pi6 > fx-i{yt) + l+S. The last inequality is true since 5 < landpj < 1. 
Thus, vs{{xt,Xb}) increases by at most {p^^ +px^)2pi5 while vs{{yt,yb}) decreases by {py^ +py^)2{\ — pi)6 = 
{Pxt + Pxt)'2piS, ensuring that violation score along the vertical dimension does not increase. □ 

□ 

4.2. 1 Proof of Lemma |43] 

Using the arguments very similar to IIJRllll as given below, we can get the following sequence of inequalities 



Dist{fi^iJ{) = Dist{fi^i,A,{f,^i)) < Y (Px +Py) 

(^,?/)eV.(/,_i) 

<^VS\fi^i)<^VS\f) + 2{d-i)S<^^ Y {Px+Py)-ImD{f) + 2{d-i)5 

ix,y)ev^(f) 

Here functions {/«} are defined in the same way as IIJRlli . The first inequality holds because Ai modifies 
/ only at the endpoints points x and y of violated edge {x, y) along dimension i, thus paying px + Py. The second 
and fourth inequalities follow from Equation Q and the third inequality holds because of Lemma 4.9 Therefore, 
by triangle inequality, we have 



^^st(/,/d)<E^^^*(/-i'/^)^E ( E iPx+Py)-'^-^^^^^^] +2{d-i)5 

im im \{x,y):^V^f(H) I 

\{^,2/)ey(/)) / 

For a function which is e-far from Lipschitz, we have Dist{f, fd) > e. Therefore, from the above inequality, 
we have 



E 



{Px+Py) ^ 5{e-d^5) 



d - d- ImDlf) 

{x,y)&V(f) ' 
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5 Instantiation of privacy tester using Lipschitz testers 



In this section, we instantiate the privacy tester of Section|3]with both known Lipschitz testers as well as the Lips- 
chitz tester developed in this work. This is presented in the table below. The third column gives the "approximation 
factor" as defined in Definition 3.2 for the various testers. The final column gives the privacy tester parameters that 
each of the tester achieves. The last row gives the result of Lipschitz tester (SectionHb developed in this work. 



Reference 


Functions 


Approximation factor (0) 


Distribution 


Tester running time 


Privacy tester 


llJRllll 


{0,1}'^ 


1 + S 


Uniform 




(l + <5,a,7,/3) 


IIAJMR12bll 


{1,. . . -^M. 


1 + S 


Uniform 


/ dmin {ImD{f),nd} \ 


(l + <5,a,7,/3) 


ncsi2i 




1 


Uniform 




(l,Q,7,/3) 


This work 


{0,iy -^M. 


1 + 6 


Product 


p / d-ImD{f ) \ 
\ {e-d'^S)5 J 


(l + <5,a,7,/3) 



6 Discussions and Open Problems 

In this section we discuss about some of the interesting implications of our current work and some of the new 
avenues it opens up. Also we state some of the open problems that remains unresolved in our work. 



Privacy: In this work, we took the first step towards designing efficient testing algorithm for statistical data 
privacy. Our work indicates that it is indeed possible to design efficient testing algorithms for some existing 
notions of statistical data privacy (e.g., generalized differential privacy). It is important that the current paper 
should be treated as an initial study of the problem and in no way should be interpreted conclusive. It is interesting 
to explore other rigorous notions of data privacy, their applications and design testers for them. 

In this paper, we test for generalized differential privacy, which is a relaxation of differential privacy. It remains 
an open problem to design a privacy tester for exact differential privacy. The problem seems to be challenging 
because of the fact that if we want to design an efficient tester, then usually the utility guarantees for the tester 
allow it to fail with some probability. Now, differential privacy being a worst case notion, it is not clear how to 
incorporate the failure property of the tester and yet make precise claims about differential privacy. 

In the current work, we have designed privacy testers for algorithms where the domain of the data sets are 
either hypercube or hypergrid. A natural question that arises is that if we can extend the current results to design 
privacy testers when the data sets are drawn from continuous domain, unlike hypercube or hypergrid. 

Lipschitz Testing: This work presents the first Lipschitz property tester for the setting where the domain points 
are sampled from a distribution that is not uniform. Because of possible applications to statistical data privacy, this 
work has motivated the design of such Lipschitz testers for other domains, e.g. hypergrid. Also, this paper mainly 
shows the tester for the product distribution over the hypercube domain, but it still remains open to design testers 
for other distributions that may be correlated in some way (e.g., pairwise correlation). 

Acknowledgements: We would like to thank Sofya Raskhodnikova and Adam Smith for various suggestions 
and comments during the course of this project. 
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